Fix EH clause sort to handle try-in-handler nesting#8428
Fix EH clause sort to handle try-in-handler nesting#8428
Conversation
BenchmarksBenchmark execution time: 2026-04-10 16:42:25 Comparing candidate commit 6b6b9a4 in PR branch Found 29 performance improvements and 40 performance regressions! Performance is the same for 211 metrics, 8 unstable metrics.
|
|
Closing this as I just thought it would be easier to share this as a PR form than not, if this is valuable can always pick it back up but we don't know if this actually solves anything. |
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8428) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (72ms) : 70, 75
master - mean (72ms) : 69, 75
section Bailout
This PR (8428) - mean (76ms) : 74, 79
master - mean (76ms) : 74, 78
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,072ms) : 1029, 1115
master - mean (1,064ms) : 1026, 1102
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (113ms) : 108, 118
master - mean (112ms) : 108, 115
section Bailout
This PR (8428) - mean (115ms) : 111, 119
master - mean (114ms) : 111, 116
section CallTarget+Inlining+NGEN
This PR (8428) - mean (789ms) : 763, 814
master - mean (783ms) : 764, 802
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (100ms) : 97, 103
master - mean (99ms) : 95, 103
section Bailout
This PR (8428) - mean (101ms) : 98, 103
master - mean (101ms) : 98, 105
section CallTarget+Inlining+NGEN
This PR (8428) - mean (937ms) : 896, 978
master - mean (936ms) : 904, 969
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (98ms) : 94, 103
master - mean (98ms) : 95, 101
section Bailout
This PR (8428) - mean (98ms) : 96, 101
master - mean (99ms) : 97, 101
section CallTarget+Inlining+NGEN
This PR (8428) - mean (812ms) : 778, 845
master - mean (814ms) : 780, 848
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (201ms) : 198, 205
master - mean (201ms) : 196, 205
section Bailout
This PR (8428) - mean (204ms) : 201, 207
master - mean (204ms) : 201, 206
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,176ms) : 1134, 1218
master - mean (1,182ms) : 1132, 1232
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (290ms) : 283, 296
master - mean (288ms) : 284, 292
section Bailout
This PR (8428) - mean (289ms) : 286, 293
master - mean (288ms) : 285, 291
section CallTarget+Inlining+NGEN
This PR (8428) - mean (965ms) : 943, 986
master - mean (963ms) : 936, 989
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (283ms) : 277, 289
master - mean (283ms) : 278, 288
section Bailout
This PR (8428) - mean (282ms) : 278, 285
master - mean (282ms) : 277, 287
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,156ms) : 1114, 1199
master - mean (1,158ms) : 1124, 1191
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8428) - mean (280ms) : 275, 284
master - mean (282ms) : 277, 287
section Bailout
This PR (8428) - mean (281ms) : 278, 284
master - mean (282ms) : 276, 288
section CallTarget+Inlining+NGEN
This PR (8428) - mean (1,040ms) : 996, 1085
master - mean (1,041ms) : 990, 1092
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
## Summary of changes This reproduces the crash [described in this README](https://github.com/DataDog/dd-trace-dotnet/blob/82cc4cf0e29478fd2fdc41afdf764a2abf662199/repro/APMS-19196/README.md) on linux x64 arch. Aim is to merge to #8428 First tested it's crashing on master <img width="2830" height="851" alt="image" src="https://github.com/user-attachments/assets/96e882fa-308e-44b6-8116-c93c7d5b7bb2" /> ## Reason for change ## Implementation details ## Test coverage ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. -->
Snapshots difference summaryThe following differences have been observed in committed snapshots. It is meant to help the reviewer. 3 occurrences of : + },
+ {
+ TraceId: Id_1,
+ SpanId: Id_62,
+ Name: trace.annotation,
+ Resource: ExtremeExceptionHandling.DeepNestedExceptionHandlingSync,
+ Service: Samples.TraceAnnotations,
+ ParentId: Id_2,
+ Tags: {
+ component: trace,
+ env: integration_tests,
+ language: dotnet,
+ version: 1.0.0
+ }
3 occurrences of : + },
+ {
+ TraceId: Id_1,
+ SpanId: Id_62,
+ Name: trace.annotation,
+ Resource: ExtremeExceptionHandling.DeepNestedExceptionHandlingSync,
+ Service: Samples.TraceAnnotations.VersionMismatch.AfterFeature,
+ ParentId: Id_2,
+ Tags: {
+ component: trace,
+ env: integration_tests,
+ language: dotnet,
+ version: 1.0.0
+ }
3 occurrences of : + },
+ {
+ TraceId: Id_1,
+ SpanId: Id_62,
+ Name: trace.annotation,
+ Resource: ExtremeExceptionHandling.DeepNestedExceptionHandlingSync,
+ Service: Samples.TraceAnnotations.VersionMismatch.BeforeFeature,
+ ParentId: Id_2,
+ Tags: {
+ component: trace,
+ env: integration_tests,
+ language: dotnet,
+ version: 1.0.0
+ }
|
Add original index as final tiebreaker so std::sort does not reorder catch/filter handlers that share the same try region. Without this, multiple catch blocks for the same try can be silently swapped, changing which handler the runtime evaluates first.
Extract IsProperlyContained helper, use explicit types, add braces on all control flow, and use named variables in the sort comparator to match surrounding native code style.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b6b9a4b4b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| std::sort(clauses, clauses + 5, [](const EHClause& a, const EHClause& b) { | ||
| return a.m_pTryBegin->m_offset > b.m_pTryBegin->m_offset && | ||
| a.m_pTryEnd->m_offset < b.m_pTryEnd->m_offset; | ||
| }); |
There was a problem hiding this comment.
Remove undefined comparator from regression test
OldComparatorFailsMiddlewareScenario calls std::sort with a comparator that is not a strict weak ordering, so the resulting order is undefined by the C++ standard; on different STL implementations or optimization levels, new3Index can legitimately end up before orig1Index, making this test flaky even when production code is correct. Please make this regression deterministic (for example by asserting pairwise comparator behavior directly, or by validating the old logic without invoking std::sort).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is testing the old behavior that we won't have anymore, we proved it was failing, but maybe we could remove this test?
There was a problem hiding this comment.
Pull request overview
Fixes EH clause ordering in ILRewriter::Export() to correctly handle clauses whose try region is nested inside another clause’s handler region, preventing CLR InvalidProgramException during JIT validation (APMS-19196).
Changes:
- Replaces the previous EH
std::sortcomparator withILRewriter::SortEHClauses, sorting by computed nesting depth (try-in-try and try-in-handler) with a strict weak ordering. - Adds native unit tests for EH sorting plus a managed “extreme EH” integration scenario to reproduce the crash.
- Updates integration test expectations/snapshots to account for the additional traced method.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tracer/src/Datadog.Tracer.Native/il_rewriter.cpp | Uses new SortEHClauses in Export() and implements depth-based EH sorting. |
| tracer/src/Datadog.Tracer.Native/il_rewriter.h | Declares ILRewriter::SortEHClauses for reuse/unit testing. |
| tracer/test/Datadog.Tracer.Native.Tests/il_rewriter_eh_sort_test.cpp | Adds gtest coverage for try-in-try and try-in-handler ordering, including a repro of the old comparator failure. |
| tracer/test/Datadog.Tracer.Native.Tests/Datadog.Tracer.Native.Tests.vcxproj | Includes the new native test source in the VS project. |
| tracer/test/Datadog.Tracer.Native.Tests/Datadog.Tracer.Native.Tests.vcxproj.filters | Adds the new native test source to filters for VS organization. |
| tracer/test/Datadog.Tracer.Native.Tests/CMakeLists.txt | Includes the new test source in the CMake test target. |
| tracer/test/test-applications/integrations/Samples.TraceAnnotations/ExtremeExceptionHandling.cs | Adds a managed method with deep EH nesting used as a regression repro. |
| tracer/test/test-applications/integrations/Samples.TraceAnnotations/ProgramHelpers.cs | Executes the new extreme EH scenario as part of the sample run. |
| tracer/test/test-applications/integrations/Samples.TraceAnnotations.VersionMismatch.BeforeFeature/Samples.TraceAnnotations.VersionMismatch.BeforeFeature.csproj | Ensures the new managed file is compiled into the version-mismatch sample. |
| tracer/test/test-applications/integrations/Samples.TraceAnnotations.VersionMismatch.AfterFeature/Samples.TraceAnnotations.VersionMismatch.AfterFeature.csproj | Ensures the new managed file is compiled into the version-mismatch sample. |
| tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/TraceAnnotationsTests.cs | Updates expected span count and sets DD_TRACE_METHODS to trace the new sync repro method. |
| tracer/test/snapshots/TraceAnnotationsAutomaticOnlyTests._.Net.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsAutomaticOnlyTests._.DotNet.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsAutomaticOnlyTests._.Core.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsVersionMismatchAfterFeatureTests._.Net.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsVersionMismatchAfterFeatureTests._.DotNet.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsVersionMismatchAfterFeatureTests._.Core.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsVersionMismatchBeforeFeatureTests._.Net.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsVersionMismatchBeforeFeatureTests._.DotNet.verified.txt | Snapshot updated for added span from the extreme EH method. |
| tracer/test/snapshots/TraceAnnotationsVersionMismatchBeforeFeatureTests._.Core.verified.txt | Snapshot updated for added span from the extreme EH method. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include "pch.h" | ||
|
|
||
| #include "../../src/Datadog.Tracer.Native/il_rewriter.h" | ||
|
|
||
| // Helper to build a doubly-linked chain of ILInstr nodes with specified offsets. | ||
| // Returns a heap-allocated array; the caller must delete[]. | ||
| // Each node's m_pNext and m_pPrev are wired up in sequence. The last node's | ||
| // m_pNext points to a sentinel (the first node acts as its own sentinel for | ||
| // handler-end calculations via m_pHandlerEnd->m_pNext). | ||
| static ILInstr* MakeInstrChain(const std::vector<unsigned>& offsets) | ||
| { | ||
| auto n = offsets.size(); | ||
| auto* instrs = new ILInstr[n]; | ||
| memset(instrs, 0, n * sizeof(ILInstr)); |
There was a problem hiding this comment.
This test file uses std::vector, std::sort, and memset but does not include the corresponding standard headers. With the current pch.h (only gtest + CLR headers), this will not compile on a clean build. Add explicit includes for the headers you use (e.g., , , ).
| // Helper to build a doubly-linked chain of ILInstr nodes with specified offsets. | ||
| // Returns a heap-allocated array; the caller must delete[]. | ||
| // Each node's m_pNext and m_pPrev are wired up in sequence. The last node's | ||
| // m_pNext points to a sentinel (the first node acts as its own sentinel for | ||
| // handler-end calculations via m_pHandlerEnd->m_pNext). | ||
| static ILInstr* MakeInstrChain(const std::vector<unsigned>& offsets) | ||
| { | ||
| auto n = offsets.size(); | ||
| auto* instrs = new ILInstr[n]; | ||
| memset(instrs, 0, n * sizeof(ILInstr)); | ||
| for (size_t i = 0; i < n; i++) | ||
| { | ||
| instrs[i].m_offset = offsets[i]; | ||
| instrs[i].m_pPrev = (i > 0) ? &instrs[i - 1] : &instrs[n - 1]; | ||
| instrs[i].m_pNext = (i < n - 1) ? &instrs[i + 1] : &instrs[0]; | ||
| } | ||
| return instrs; |
There was a problem hiding this comment.
MakeInstrChain builds a circular list (last->next points back to the first node), but the comment describes a sentinel node for handler-end calculations. This can be misleading, and it also doesn’t model ILRewriter’s real list where the last instruction’s m_pNext points to a dedicated sentinel with m_offset == code size (important when a handler ends at the end of the method). Consider either updating the comment to reflect the circular list, or creating an explicit sentinel node in the helper to better match production semantics.
anna-git
left a comment
There was a problem hiding this comment.
Added a few comments, after trying to wrap my head around it 😅
From what I understand, the new algorithm does:
- calculate depth of each exception handling clause
- create a blank array the size of the clauses
- compare eh clause's depth 2 by 2 and reorder indices according to it
a. if they have different depths, sort the new indices array according to it, putting the deeper depth first (ecma rule)
b. if depths are the same, either sort by try begin offset
c. either the offsets are the same, then just keep the original order
| // All values are IL offsets using exclusive ends (first offset past the region). | ||
| static bool IsProperlyContained(unsigned innerBegin, unsigned innerEnd, unsigned outerBegin, unsigned outerEnd) | ||
| { | ||
| bool contained = innerBegin >= outerBegin && innerEnd <= outerEnd; |
There was a problem hiding this comment.
NIT with early exit
| bool contained = innerBegin >= outerBegin && innerEnd <= outerEnd; | |
| bool identical = innerBegin == outerBegin && innerEnd == outerEnd; | |
| if (identical) | |
| { | |
| return false; | |
| } | |
| bool contained = innerBegin >= outerBegin && innerEnd <= outerEnd; | |
| return contained; |
| } | ||
|
|
||
| std::sort(indices.get(), indices.get() + nEH, [&](unsigned a, unsigned b) { | ||
| if (depth[a] != depth[b]) |
There was a problem hiding this comment.
| if (depth[a] != depth[b]) | |
| // Primary (inner try first): e.g. try { try { } catch { } } catch { } | |
| if (depth[a] != depth[b]) |
| return depth[a] > depth[b]; | ||
| } | ||
|
|
||
| unsigned offsetA = pEH[a].m_pTryBegin->m_offset; |
There was a problem hiding this comment.
| unsigned offsetA = pEH[a].m_pTryBegin->m_offset; | |
| // Secondary (same depth, different tries — order by try start offset): e.g. | |
| // try { } catch { } try { } catch { } | |
| unsigned offsetA = pEH[a].m_pTryBegin->m_offset; |
|
|
||
| // Preserve original order for clauses with equal depth and try offset | ||
| // (e.g. multiple catch handlers for the same try block). | ||
| return a < b; |
There was a problem hiding this comment.
| return a < b; | |
| // Tertiary (same try — keep compiler order): e.g. try { } catch (A) { } catch (B) { } | |
| return a < b; |
| std::sort(clauses, clauses + 5, [](const EHClause& a, const EHClause& b) { | ||
| return a.m_pTryBegin->m_offset > b.m_pTryBegin->m_offset && | ||
| a.m_pTryEnd->m_offset < b.m_pTryEnd->m_offset; | ||
| }); |
There was a problem hiding this comment.
This is testing the old behavior that we won't have anymore, we proved it was failing, but maybe we could remove this test?
| #include "pch.h" | ||
|
|
||
| #include "../../src/Datadog.Tracer.Native/il_rewriter.h" | ||
|
|
There was a problem hiding this comment.
Copilot complaining about this:
| #include <cstring> | |
| #include <vector> |
| // Each node's m_pNext and m_pPrev are wired up in sequence. The last node's | ||
| // m_pNext points to a sentinel (the first node acts as its own sentinel for | ||
| // handler-end calculations via m_pHandlerEnd->m_pNext). |
There was a problem hiding this comment.
| // Each node's m_pNext and m_pPrev are wired up in sequence. The last node's | |
| // m_pNext points to a sentinel (the first node acts as its own sentinel for | |
| // handler-end calculations via m_pHandlerEnd->m_pNext). | |
| // For simplicity, nodes are linked in a ring (last points to first). This is not production IL layout; it avoids a separate sentinel while keeping every node with a valid successor. | |
| // SortEHClauses uses m_pHandlerEnd->m_pNext->m_offset as the exclusive end of a handler region (same convention as ILRewriter::ImportEH). |
Summary of changes
Fix incorrect EH clause ordering in
ILRewriter::Export()that causedInvalidProgramExceptionwhen instrumenting methods with try blocks nested inside handler regions.Reason for change
When we instrument a method, the IL rewriter may insert try/catch blocks inside the handler of an existing try/catch (e.g., debugger inserting
EndMethod(SetException)inside an async state machine's outer catch, or CallTarget instrumentation wrapping handler logic). ECMA-335 II.19 requires such nested clauses to appear before the clause whose region encloses them in the EH table.The existing sort comparator only checked try-in-try containment:
return a.m_pTryBegin->m_offset > b.m_pTryBegin->m_offset && a.m_pTryEnd->m_offset < b.m_pTryEnd->m_offset;Because the inserted clause's try region starts after the outer clause's try region ends, this check returns false. The clause stays in its appended position -- after the outer clause -- violating ECMA-335. The CLR validates EH table ordering on JIT and rejects the method with
InvalidProgramException.The comparator also violated strict weak ordering required by
std::sort(transitivity of incomparability fails for sibling clauses adjacent to a nesting relationship), making its behavior undefined.Implementation details
Replaced the inline
std::sortwithILRewriter::SortEHClauses, a new static method that:Computes a nesting depth for each clause: the number of other clauses whose try or handler region strictly contains this clause's try region. This correctly detects both try-in-try and try-in-handler nesting per ECMA-335 II.19.
Sorts by (depth descending, try offset ascending), which is a total order and therefore satisfies strict weak ordering. Deeper-nested clauses sort first, satisfying the ECMA-335 requirement.
The method is extracted as a
staticto make it directly unit-testable without requiringICorProfilerInfo. Internal allocations usestd::unique_ptrto prevent memory leaks if an allocation fails.Test coverage
Added
ILRewriterEHSortTestinDatadog.Tracer.Native.Testswith 8 Google Test cases covering:EndMethod(SetException)clause is ordered before the outer clause whose handler contains itThe
OldComparatorFailsMiddlewareScenariotest applies the original comparator inline against the middleware topology and asserts it produces the wrong ordering, confirming the bug is real and the new tests would catch a regression.Added integration test in
tracer/test/test-applications/integrations/Samples.TraceAnnotations/ExtremeExceptionHandling.csthat reproduces the crash. Without the fix, the test causes a CLR fatal exception.Other details
Symptom:
System.InvalidProgramException: Common Language Runtime detected an invalid programthrown on any instrumented method whose EH table contains try-in-handler nesting. Originally observed on async middlewareInvokeAsyncmethods with Exception Replay enabled, but affects any instrumentation path that goes throughILRewriter::Export().Linked to APMS-19196